Skip to content

Disable database triggers during import #1996

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Dec 20, 2019

Conversation

bruceadams
Copy link
Contributor

  • This allows updated_at field values to be loaded during import.
    Before this change, updated_at was set to when the import ran,
    overwriting the values being loaded.
  • Also, the import completes in about one fifth the time
    (on my slow MacBook Air).

@rust-highfive
Copy link

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @sgrif (or someone else) soon.

If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes.

Please see the contribution instructions for more information.

@bruceadams
Copy link
Contributor Author

bruceadams commented Dec 19, 2019

I have not figured out how I can test this code change. What I did test was a hand edited import.sql from a download of db-dump.tar.gz. The code in this pull request is my attempt to enhance the template import.sql to generate what I ran.

Hand edited import.sql [and edited again to remove the unneeded COMMIT and BEGIN pairs]:

BEGIN;
    -- Disable triggers on each table.
    ALTER TABLE "categories" DISABLE TRIGGER ALL;
    ALTER TABLE "crates" DISABLE TRIGGER ALL;
    ALTER TABLE "keywords" DISABLE TRIGGER ALL;
    ALTER TABLE "metadata" DISABLE TRIGGER ALL;
    ALTER TABLE "reserved_crate_names" DISABLE TRIGGER ALL;
    ALTER TABLE "teams" DISABLE TRIGGER ALL;
    ALTER TABLE "users" DISABLE TRIGGER ALL;
    ALTER TABLE "badges" DISABLE TRIGGER ALL;
    ALTER TABLE "crates_categories" DISABLE TRIGGER ALL;
    ALTER TABLE "crates_keywords" DISABLE TRIGGER ALL;
    ALTER TABLE "crate_owners" DISABLE TRIGGER ALL;
    ALTER TABLE "versions" DISABLE TRIGGER ALL;
    ALTER TABLE "dependencies" DISABLE TRIGGER ALL;
    ALTER TABLE "version_authors" DISABLE TRIGGER ALL;
    ALTER TABLE "version_downloads" DISABLE TRIGGER ALL;

    -- Set defaults for non-nullable columns not included in the dump.
    ALTER TABLE "users" ALTER COLUMN "gh_access_token" SET DEFAULT '';

    -- Truncate all tables.
    TRUNCATE "categories" RESTART IDENTITY CASCADE;
    TRUNCATE "crates" RESTART IDENTITY CASCADE;
    TRUNCATE "keywords" RESTART IDENTITY CASCADE;
    TRUNCATE "metadata" RESTART IDENTITY CASCADE;
    TRUNCATE "reserved_crate_names" RESTART IDENTITY CASCADE;
    TRUNCATE "teams" RESTART IDENTITY CASCADE;
    TRUNCATE "users" RESTART IDENTITY CASCADE;
    TRUNCATE "badges" RESTART IDENTITY CASCADE;
    TRUNCATE "crates_categories" RESTART IDENTITY CASCADE;
    TRUNCATE "crates_keywords" RESTART IDENTITY CASCADE;
    TRUNCATE "crate_owners" RESTART IDENTITY CASCADE;
    TRUNCATE "versions" RESTART IDENTITY CASCADE;
    TRUNCATE "dependencies" RESTART IDENTITY CASCADE;
    TRUNCATE "version_authors" RESTART IDENTITY CASCADE;
    TRUNCATE "version_downloads" RESTART IDENTITY CASCADE;

    -- Import the CSV data.
    \copy "categories" ("category", "crates_cnt", "created_at", "description", "id", "path", "slug") FROM 'data/categories.csv' WITH CSV HEADER
    \copy "crates" ("created_at", "description", "documentation", "downloads", "homepage", "id", "max_upload_size", "name", "readme", "repository", "textsearchable_index_col", "updated_at") FROM 'data/crates.csv' WITH CSV HEADER
    \copy "keywords" ("crates_cnt", "created_at", "id", "keyword") FROM 'data/keywords.csv' WITH CSV HEADER
    \copy "metadata" ("total_downloads") FROM 'data/metadata.csv' WITH CSV HEADER
    \copy "reserved_crate_names" ("name") FROM 'data/reserved_crate_names.csv' WITH CSV HEADER
    \copy "teams" ("avatar", "github_id", "id", "login", "name") FROM 'data/teams.csv' WITH CSV HEADER
    \copy "users" ("gh_avatar", "gh_id", "gh_login", "id", "name") FROM 'data/users.csv' WITH CSV HEADER
    \copy "badges" ("attributes", "badge_type", "crate_id") FROM 'data/badges.csv' WITH CSV HEADER
    \copy "crates_categories" ("category_id", "crate_id") FROM 'data/crates_categories.csv' WITH CSV HEADER
    \copy "crates_keywords" ("crate_id", "keyword_id") FROM 'data/crates_keywords.csv' WITH CSV HEADER
    \copy "crate_owners" ("crate_id", "created_at", "owner_id", "owner_kind") FROM 'data/crate_owners.csv' WITH CSV HEADER
    \copy "versions" ("crate_id", "crate_size", "created_at", "downloads", "features", "id", "license", "num", "published_by", "updated_at", "yanked") FROM 'data/versions.csv' WITH CSV HEADER
    \copy "dependencies" ("crate_id", "default_features", "features", "id", "kind", "optional", "req", "target", "version_id") FROM 'data/dependencies.csv' WITH CSV HEADER
    \copy "version_authors" ("id", "name", "version_id") FROM 'data/version_authors.csv' WITH CSV HEADER
    \copy "version_downloads" ("date", "downloads", "version_id") FROM 'data/version_downloads.csv' WITH CSV HEADER

    -- Drop the defaults again.
    ALTER TABLE "users" ALTER COLUMN "gh_access_token" DROP DEFAULT;

    -- Reenable triggers on each table.
    ALTER TABLE "categories" ENABLE TRIGGER ALL;
    ALTER TABLE "crates" ENABLE TRIGGER ALL;
    ALTER TABLE "keywords" ENABLE TRIGGER ALL;
    ALTER TABLE "metadata" ENABLE TRIGGER ALL;
    ALTER TABLE "reserved_crate_names" ENABLE TRIGGER ALL;
    ALTER TABLE "teams" ENABLE TRIGGER ALL;
    ALTER TABLE "users" ENABLE TRIGGER ALL;
    ALTER TABLE "badges" ENABLE TRIGGER ALL;
    ALTER TABLE "crates_categories" ENABLE TRIGGER ALL;
    ALTER TABLE "crates_keywords" ENABLE TRIGGER ALL;
    ALTER TABLE "crate_owners" ENABLE TRIGGER ALL;
    ALTER TABLE "versions" ENABLE TRIGGER ALL;
    ALTER TABLE "dependencies" ENABLE TRIGGER ALL;
    ALTER TABLE "version_authors" ENABLE TRIGGER ALL;
    ALTER TABLE "version_downloads" ENABLE TRIGGER ALL;
COMMIT;

Copy link
Contributor

@smarnach smarnach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix! I also noticed this problem a few days ago, but so far didn't even manage to file an issue. :)

You can test this on your local instance by running

cargo run --bin enqueue-job dump_db

while the background worker is running. The dump will end up in the ./local-uploads directory.

@bruceadams
Copy link
Contributor Author

It looks like I cannot run the background worker on macOS?

$ cargo run --bin background-worker
    Finished dev [unoptimized + debuginfo] target(s) in 0.28s
     Running `target/debug/background-worker`
<jemalloc>: option background_thread currently supports pthread only
memory allocation of 40 bytes failedAbort trap: 6

The README for jemallocator seems to agree that jemalloc is only supported on x86_64 Linux. I'll see if I can run on Linux (probably in Docker, since that's what I already have available). That'll take me a while; maybe not today.

@smarnach
Copy link
Contributor

The crates.io backend is only meant to work on Linux (since this is how we run it in production). I'm not sure any of the components work when ran natively on Mac.

There's a docker-compose config file available, which should make getting started with running in Docker straight-forward. There's also a bit of documentation for this in docs/CONTRIBUTING.md in this repo.,

@jtgeibel
Copy link
Member

jtgeibel commented Dec 19, 2019 via email

@carols10cents
Copy link
Member

The crates.io backend is only meant to work on Linux (since this is how we run it in production). I'm not sure any of the components work when ran natively on Mac.

I develop on macOS so I'm invested in making sure it works ;) It usually does :)

@bruceadams
Copy link
Contributor Author

After rebasing on master (which I haven't pushed up to this PR) 🤷‍

$ cargo run --bin background-worker
    Finished dev [unoptimized + debuginfo] target(s) in 1.31s
     Running `target/debug/background-worker`
Booting runner
Using local uploader, crate files will be in the local_uploads directory
Cloning index
thread 'main' panicked at 'Failed to clone index: Error { code: -1, klass: 2, message: "failed to resolve path \'file://./tmp/index-bare\': No such file or directory" }', src/libcore/result.rs:1165:5
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace.
$ ls -al tmp/index-bare
.rw-r--r-- 111 bruce 19 Dec 21:09 config
.rw-r--r--  73 bruce 19 Dec 21:09 description
.rw-r--r--  23 bruce 19 Dec 21:09 HEAD
drwxr-xr-x   - bruce 19 Dec 21:09 hooks
drwxr-xr-x   - bruce 19 Dec 21:09 info
drwxr-xr-x   - bruce 19 Dec 21:09 objects
drwxr-xr-x   - bruce 19 Dec 21:09 refs

@smarnach
Copy link
Contributor

@bruceadams
Copy link
Contributor Author

I did (that's what I was trying to show by listing tmp/index-bare earlier). More precisely, I did this:

$ script/init-local-index.sh
tmp/index-bare already exists, exiting
$ # Huh. Already there. Delete it and recreate...
$ rm -rf tmp
$ script/init-local-index.sh
Initializing repository in tmp/index-bare...
Creating checkout in tmp/index-bare...
Your local git index is ready to go!

Please refer to https://github.com/rust-lang/crates.io/blob/master/README.md for more info!
$ cargo run --bin background-worker
    Finished dev [unoptimized + debuginfo] target(s) in 0.82s
     Running `target/debug/background-worker`
Booting runner
Using local uploader, crate files will be in the local_uploads directory
Cloning index
thread 'main' panicked at 'Failed to clone index: Error { code: -1, klass: 2, message: "failed to resolve path \'file://./tmp/index-bare\': No such file or directory" }', src/libcore/result.rs:1165:5
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace.

- This allows `updated_at` field values to be loaded during import.
  Before this change, `updated_at` was set to when the import ran,
  overwriting the values being loaded.
- Also, the import completes in about one fifth the time
  (on my slow MacBook Air).
@bruceadams bruceadams force-pushed the no-trigger-during-import branch from b6664ce to 7415074 Compare December 20, 2019 13:00
@smarnach
Copy link
Contributor

@bruceadams Sorry, I obviously didn't read your previous comment carefully. But I indeed remember seeing this issue before, and it even is on my "bugs to file" list on my other laptop…

Apparently the reletaive file: URL is not resolved correctly. I couldn't make a relative URL work at all in this place, and needed to manually add the absolute path to the index to .env, e.g.

export GTIHUB_REPO_URL=file:///home/sven/src/crates.io/tmp/index-bare

Feel free to file a bug about this if you have time – we at least need to document this, and preferably find a way to use a relative path to make it Just Work.

@bruceadams
Copy link
Contributor Author

Oops! My earlier comment sounded impatient or irritated. Sorry! I really appreciate your engaging with me on this, especially when my available time is so choppy. I’ll work to communicate my appreciation and joy more clearly in the future.

It worked! 🎊

$ export GIT_REPO_URL=file:///Users/bruce/github/crates.io/tmp/index-bare
$ cargo run --bin background-worker
    Finished dev [unoptimized + debuginfo] target(s) in 0.30s
     Running `target/debug/background-worker`
Booting runner
Using local uploader, crate files will be in the local_uploads directory
Cloning index
Runner booted, running jobs
Database dump uploaded to db-dump.tar.gz.

Thank you!

And, the generated import.sql exactly matches my hand edited one. I done all the testing I can think of for this. (I also rebased this PR onto master as of the past hour or so.)

Is there anything else I can do to help this PR along? I, personally, am in no hurry to get this merged, since I have a simple workaround for the problem being fixed here.

@smarnach
Copy link
Contributor

smarnach commented Dec 20, 2019

No worries, you did not sound irritated, and I did not take any offence. I just apologised for unnecessarily increasing the number of communication round trips. :)

There is nothing else to do to move this along. Based on your testing results, I will simply merge this. :)

@bors r+

@bors
Copy link
Contributor

bors commented Dec 20, 2019

📌 Commit 7415074 has been approved by smarnach

@bors
Copy link
Contributor

bors commented Dec 20, 2019

⌛ Testing commit 7415074 with merge 0565e19...

bors added a commit that referenced this pull request Dec 20, 2019
Disable database triggers during import

- This allows `updated_at` field values to be loaded during import.
  Before this change, `updated_at` was set to when the import ran,
  overwriting the values being loaded.
- Also, the import completes in about one fifth the time
  (on my slow MacBook Air).
@bors
Copy link
Contributor

bors commented Dec 20, 2019

☀️ Test successful - checks-travis
Approved by: smarnach
Pushing 0565e19 to master...

@bors bors merged commit 7415074 into rust-lang:master Dec 20, 2019
@carols10cents
Copy link
Member

and it even is on my "bugs to file" list on my other laptop…

I want your bugs!!

@smarnach
Copy link
Contributor

I want your bugs!!

Sure, sure, you will get them! Several of them have actually been posted in the last few days, but I'll take a look this evening what's still remaining on my list.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants